APHID: An architecture for private, high-performance integrated data mining
نویسندگان
چکیده
While the emerging field of privacy preserving data mining (PPDM) will enable many new data mining applications, it suffers from several practical difficulties. PPDM algorithms are challenging to develop and computationally intensive to execute. Developers need convenient abstractions to simplify engineering of PPDM applications. The individual parties involved in the data mining process need a way to bring high-performance, parallel computers to bear on the computationally intensive parts of the PPDM tasks. This paper discusses APHID (Architecture for Private and High-performance Integrated Data mining), a practical architecture and software framework for developing and executing large scale PPDM applications. At one tier, the system supports simplified use of cluster and grid resources, and at another tier, the system abstracts communication for easy PPDM algorithm development. This paper offers a detailed analysis of the challenges in developing PPDM algorithms with existing frameworks, and motivates the design of a new infrastructure based on these challenges.
منابع مشابه
APHID: A Practical Architecture for High-Performance, Privacy-Preserving Data Mining
While the emerging field of privacy preserving data mining (PPDM) will enable many new data mining applications, it suffers from several practical difficulties. PPDM algorithms are difficult to develop and computationally intensive to execute. Developers need convenient abstractions to reduce the costs of engineering PPDM applications. The individual parties involved in the data mining process ...
متن کاملAn Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملDarwin: A Scalable Integrated System for Data Mining
Darwin is a high-performance scalable integrated system for Data Mining and Knowledge Discovery in large databases. In this paper we present an overview of Darwin’s philosophy, architecture and functionality. We also describe the application of Darwin to selected datasets.
متن کاملAn Architecture for Security and Protection of Big Data
The issue of online privacy and security is a challenging subject, as it concerns the privacy of data that are increasingly more accessible via the internet. In other words, people who intend to access the private information of other users can do so more efficiently over the internet. This study is an attempt to address the privacy issue of distributed big data in the context of cloud computin...
متن کاملPorosity Rendering in High-Performance Architecture: Wind-Driven Natural Ventilation and Porosity Distribution Patterns
Natural ventilation is one of the most essential issues in the concept of high-performance architecture. The porosity has a lot to do with wind-phil architecture to meet high efficiency in integrated architectural design and materialization a high-performance building. Natural ventilation performance in porous buildings is influenced by a wide range of interre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Generation Comp. Syst.
دوره 26 شماره
صفحات -
تاریخ انتشار 2010